51 results found.
Written
Lexicon,
Language Type:
Multilingual
Languages:
English Aromanian; Arumanian; Macedo-Romanian Bulgarian Gheg Albanian Greek Mandarin Chinese Modern Standard Arabic Old Russian Polish Romano-Serbian
Availability:
available for research
License:
<Not Specified>
Size:
15.000 per language-pair entriesProduction Status:
Newly created-in progress
Use:
Machine Translation, SpeechToSpeech Translation
Paper:
N/A
Documentation:
None
Written
Lexicon,
Language Type:
Multilingual
Languages:
Bulgarian English Finnish German french
Availability:
Freely Available
License:
CreativeCommons
Size:
117000 words Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
Paper:
N/A
Documentation:
<Not Specified>Language Type:
Multilingual
Languages:
Bulgarian Croatian Gheg Albanian Mandarin Chinese Standard Arabic
Availability:
Freely Available
License:
<Not Specified>
Size:
55k queries; 196k documents Production Status:
Newly created-in progress
Use:
Knowledge Discovery/Representation
Paper:
N/A
Documentation:
<Not Specified>
Speech
Corpus,
Language Type:
Multilingual
Languages:
Bulgarian
Availability:
From Owner
License:
Free for scientific use
Size:
32 hours Production Status:
Newly created-finished
Use:
Speech Recognition/Understanding
Paper:
N/A
Documentation:
<Not Specified>
Written
Term mapper,
Language Type:
Multilingual
Languages:
Bulgarian English German Latvian Lithuanian
Availability:
From Owner
License:
CC-NC-SA (http://creativecommons.org/licenses/by-nc-sa/3.0/)
Size:
127 KB - the tool alone, however, all available resources make up to 4 GB OtherProduction Status:
Newly created-finished
Use:
Term mapping
Paper:
N/A
Documentation:
https://github.com/pmarcis/mp-aligner
Written
Corpus,
Language Type:
Multilingual
Languages:
Bulgarian Czech English Hungarian Romanian
Availability:
From Data Center(s)
License:
ELRA
Size:
75Mbyte Production Status:
Existing-used
Use:
POS Induction
Paper:
N/A
Documentation:
English
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Basque Bulgarian Danish Dutch English Estonian German Hungarian Irish Italian Portuguese Russian Serbian Slovenian Spanish
Availability:
Freely Available
License:
Size:
3 MByte Production Status:
Newly created-in progress
Use:
Lexicon Creation/Annotation
-
Paper title:A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Sina Ahmadi | Monolingual Word Sense Alignment | /N |
Documentation:
None
Written
Treebank,
Language Type:
Monolingual
Languages:
Afrikaans Akkadian Amharic Ancient Greek Arabic Armenian Assyrian Bambara Basque Belarusian Bhojpuri Breton Bulgarian Buryat Cantonese Catalan Chinese Classical Chinese Coptic Croatian Czech Danish Dutch English Erzya Estonian Faroese Finnish French Galician German Gothic Greek Hebrew Hindi Hindi English Hungarian Indonesian Irish Italian Japanese Karelian Kazakh Komi Permyak Komi Zyrian Korean Kurmanji Latin Latvian Lithuanian Livvi Maltese Marathi Mbya Guarani Moksha Naija North Sami Norwegian Old Church Slavonic Old French Old Russian Persian Polish Portuguese Romanian Russian Sanskrit Scottish Gaelic Serbian Skolt Sami Slovak Slovenian Spanish Swedish Swedish Sign Language Swiss German Tagalog Tamil Telugu Thai Turkish Ukrainian Upper Sorbian Urdu Uyghur Vietnamese Warlpiri Welsh Wolof Yoruba
Availability:
Freely Available
License:
Various
Size:
25 million words Production Status:
Existing-updated
Use:
Parsing and Tagging
-
Paper title:Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Joakim Nivre | Universal Dependencies | /N |
Documentation:
https://universaldependencies.org
Written
Lexicon,
Language Type:
Multilingual
Languages:
Bulgarian Catalan Chinese Dutch English Estonian Finnish Italian Portuguese Slovenian Spanish Swedish Thai and Turkish
Availability:
Freely Available
License:
Open Source
Size:
41 411 senses for Bulgarian, 35 820 for Swedish OtherProduction Status:
Newly created-in progress
Use:
Word Sense Disambiguation
-
Paper title:A Parallel WordNet for English, Swedish and Bulgarian
-
Paper track:Written/poster presentation with demo
-
Paper status:Accept Poster+Demo
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Krasimir Angelov | GF WordNet | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bulgarian Catalan Croatian Czech Danish Dutch English Estonian Filipino Finnish French German Greek Hebrew Hindi Hungarian Indonesian Italian Japanese Korean Latvian Lithuanian Malay Norwegian Persian Polish Portuguese Romanian Russian Serbian Simplified Chinese Slovak Slovenian Spanish Swedish Thai Traditional Chinese Turkish Ukrainian Vietnamese
Availability:
Freely Available
License:
CC-BY-SA
Size:
60 GByte Production Status:
Newly created-in progress
Use:
Language Modelling
-
Paper title:Wiki-40B: Multilingual Language Model Dataset
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rami Al-Rfou | Wiki40B-LM | /N |
Documentation:
None




